InfoMagic Internet Tools 1995 April

home *** CD-ROM | disk | FTP | other *** search

/ InfoMagic Internet Tools 1995 April / Internet Tools.iso / infoserv / www / cern / dev / www-talk.9301-9306.Z / www-talk.9301-9306 / text0559.txt < prev next >

Wrap

Text File | 1995-04-24 | 3.7 KB | 99 lines

> Date: Fri, 08 Jan 93 13:57:32 CST > From: Dan Connolly <connolly@pixel.convex.com> > > This question seems to confuse two things: the ISOlat1 entity > set, and the ISO Latin 1 character set. The first is mapping > of names to glyphs, and the second is a mapping from the numbers > 128-255 to glyphs. I think they're in alphabetical order > by name, but not in order by the ISO Latin 1 character set. I think we should specify ISO latin 1 as the base set. I think that a lot of people in the nordic countries use it routinely and they will go crazy if they have to use overload the crurly brackets again as they have to with mail. Therefore, we should allow those people who have 8-bit capability to just stick in 8-bit codes. Admitedly I thought the ISO world kept to the codes 21-7E and A1-FE hex for G0 and G1 graphics sets, using the others for control sets (C0 and C1). Maybe ISO Lantin 1 has nothing to do with ISO 8 bit extensions. Sorry I can't quote ISO numbers. But whatever is common usage, let us have an 8 bit set. (Anybody illuminate us on this? Anybody got the ISO Latin 1 character set listing by number?) Now for died in the wool 7-bit hackers, is it fair to requier them to remember numbers, or would it be nicer to allow them to put in codes using entity names? Some people would I am sure like the latter, but it is NOT important because we are aiming for wysiwyg editors and so would regard human-readable character names as a temporary thing anyway. > Here is the crux of the matter: > > >The communication between it and the text object would have to be defined in > >terms of a particular character set > > And this character set is stated in the SGML declaration at > the top of html.dtd. No - that is something different. In the top of the DTD is specified the reference base set for the DTD itself and SGML documents. The interface between two software modules is something else and can be independent of that. > If we define HTML in terms of the > full ISO Latin 1 character set, then the parser can deal with > ö, and pass it to the text object as a data character, just > like an 'A' character. For X displays using iso8559 fonts, that's > cool. Sorry, is iso8559 = Iso latin 1? (I have no head for numbers >1 :-) yes it is cool. Use Midas or Viola to look at the Hyper-G stuff and it works very nicely. > But on a PC or a Mac, that means the text object will have to > scan all the data it gets and convert the Latin1 encoding to > it's own. Yuck. Yup. Big deal? Not really. Just a set of parallel tables. Peter Flynn of the CURIA project is producing a lot of archived gaelic and is currently dealing with a requirement for a line-mode browser which can switch its characetr set depending on the terminal emulator the reader is using. Problems only occur if there are characters which can't be mapped 1-1 to the local set, and must be represented by more than one character (like uumlaut -> ue, ae dipthong -> ae etc) AND you can edit, in which case the original form must be preserved. In this case, passing on of the entity is essential. But doing it for every character >127 would be a pain memorywise. So I would suggest that a configuable table define which entities can be crunched down to a single character in the local representation and the rest be passed on from the SGML parser to the SGML app as external entities. > >... and perhaps if there is more than one > >contender the SGML engine could have a compilation option. > > Hmmm... One might argue that as long as we support conversion inside > the SGML parser for EBCDIC machines, we might as well support PC and > Mac character sets while we're at it. Yes. Tim